175 research outputs found

    Non-asymptotic convergence analysis for the Unadjusted Langevin Algorithm

    Get PDF
    In this paper, we study a method to sample from a target distribution π\pi over Rd\mathbb{R}^d having a positive density with respect to the Lebesgue measure, known up to a normalisation factor. This method is based on the Euler discretization of the overdamped Langevin stochastic differential equation associated with π\pi. For both constant and decreasing step sizes in the Euler discretization, we obtain non-asymptotic bounds for the convergence to the target distribution π\pi in total variation distance. A particular attention is paid to the dependency on the dimension dd, to demonstrate the applicability of this method in the high dimensional setting. These bounds improve and extend the results of (Dalalyan 2014)

    High-dimensional Bayesian inference via the Unadjusted Langevin Algorithm

    Full text link
    We consider in this paper the problem of sampling a high-dimensional probability distribution π\pi having a density with respect to the Lebesgue measure on Rd\mathbb{R}^d, known up to a normalization constant xπ(x)=eU(x)/RdeU(y)dyx \mapsto \pi(x)= \mathrm{e}^{-U(x)}/\int_{\mathbb{R}^d} \mathrm{e}^{-U(y)} \mathrm{d} y. Such problem naturally occurs for example in Bayesian inference and machine learning. Under the assumption that UU is continuously differentiable, U\nabla U is globally Lipschitz and UU is strongly convex, we obtain non-asymptotic bounds for the convergence to stationarity in Wasserstein distance of order 22 and total variation distance of the sampling method based on the Euler discretization of the Langevin stochastic differential equation, for both constant and decreasing step sizes. The dependence on the dimension of the state space of these bounds is explicit. The convergence of an appropriately weighted empirical measure is also investigated and bounds for the mean square error and exponential deviation inequality are reported for functions which are measurable and bounded. An illustration to Bayesian inference for binary regression is presented to support our claims.Comment: Supplementary material available at https://hal.inria.fr/hal-01176084/. arXiv admin note: substantial text overlap with arXiv:1507.0502

    Bridging the Gap between Constant Step Size Stochastic Gradient Descent and Markov Chains

    Get PDF
    We consider the minimization of an objective function given access to unbiased estimates of its gradient through stochastic gradient descent (SGD) with constant step-size. While the detailed analysis was only performed for quadratic functions, we provide an explicit asymptotic expansion of the moments of the averaged SGD iterates that outlines the dependence on initial conditions, the effect of noise and the step-size, as well as the lack of convergence in the general (non-quadratic) case. For this analysis, we bring tools from Markov chain theory into the analysis of stochastic gradient. We then show that Richardson-Romberg extrapolation may be used to get closer to the global optimum and we show empirical improvements of the new extrapolation scheme

    Analysis of Langevin Monte Carlo via convex optimization

    Full text link
    In this paper, we provide new insights on the Unadjusted Langevin Algorithm. We show that this method can be formulated as a first order optimization algorithm of an objective functional defined on the Wasserstein space of order 22. Using this interpretation and techniques borrowed from convex optimization, we give a non-asymptotic analysis of this method to sample from logconcave smooth target distribution on Rd\mathbb{R}^d. Based on this interpretation, we propose two new methods for sampling from a non-smooth target distribution, which we analyze as well. Besides, these new algorithms are natural extensions of the Stochastic Gradient Langevin Dynamics (SGLD) algorithm, which is a popular extension of the Unadjusted Langevin Algorithm. Similar to SGLD, they only rely on approximations of the gradient of the target log density and can be used for large-scale Bayesian inference

    Copula-like Variational Inference

    Get PDF
    This paper considers a new family of variational distributions motivated by Sklar's theorem. This family is based on new copula-like densities on the hypercube with non-uniform marginals which can be sampled efficiently, i.e. with a complexity linear in the dimension of state space. Then, the proposed variational densities that we suggest can be seen as arising from these copula-like densities used as base distributions on the hypercube with Gaussian quantile functions and sparse rotation matrices as normalizing flows. The latter correspond to a rotation of the marginals with complexity O(dlogd)\mathcal{O}(d \log d). We provide some empirical evidence that such a variational family can also approximate non-Gaussian posteriors and can be beneficial compared to Gaussian approximations. Our method performs largely comparably to state-of-the-art variational approximations on standard regression and classification benchmarks for Bayesian Neural Networks.Comment: 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canad

    Sampling from a log-concave distribution with compact support with proximal Langevin Monte Carlo

    Get PDF
    This paper presents a detailed theoretical analysis of the Langevin Monte Carlo sampling algorithm recently introduced in Durmus et al. (Efficient Bayesian computation by proximal Markov chain Monte Carlo: when Langevin meets Moreau, 2016) when applied to log-concave probability distributions that are restricted to a convex body K\mathsf{K}. This method relies on a regularisation procedure involving the Moreau-Yosida envelope of the indicator function associated with K\mathsf{K}. Explicit convergence bounds in total variation norm and in Wasserstein distance of order 11 are established. In particular, we show that the complexity of this algorithm given a first order oracle is polynomial in the dimension of the state space. Finally, some numerical experiments are presented to compare our method with competing MCMC approaches from the literature
    corecore